On morph-based LVCSR improvements
نویسندگان
چکیده
Efficient large vocabulary continuous speech recognition of morphologically rich languages is a big challenge due to the rapid vocabulary growth. To improve the results various subword units called as morphs are applied as basic language elements. The improvements over the word baseline, however, are changing from negative to error rate halving across languages and tasks. In this paper we make an attempt to explore the source of this variability. Different LVCSR tasks of an agglutinative language are investigated in numerous experiments using full vocabularies. The improvement results are compared to pre-existing other language results, as well. Important correlations are found between the morph-based improvements and between the vocabulary growths and the corpus sizes. Index Terms — speech recognition, rich morphology, morph, language modeling, LVCSR
منابع مشابه
A bilingual study on the prediction of morph-based improvement
Morph-based language modeling has been efficiently applied in improving the accuracy of Large-Vocabulary Continuous Speech Recognition (LVCSR) systems especially in morphologically rich languages. However, the rate of improvements varies greatly and the underlying principles have been only superficially studied. Having a method that can predict the expected improvement prior to experimentations...
متن کاملA Bilingual Study on the Prediction of Morph-based Improvement
Morph-based language modeling has been efficiently applied in improving the accuracy of Large-Vocabulary Continuous Speech Recognition (LVCSR) systems especially in morphologically rich languages. However, the rate of improvements varies greatly and the underlying principles have been only superficially studied. Having a method that can predict the expected improvement prior to experimentations...
متن کاملInvestigation of morph-based speech recognition improvements across speech genres
The improvement achieved by changing the basis of speech recognition from words to morphs (various sub-word units) varies greatly across tasks and languages. We make an attempt to explore the source of this variability by the investigation of three LVCSR tasks corresponding to three speech genres of a highly agglutinative language. Novel, press conference and broadcast news transcription result...
متن کاملپارس مورف: تحلیلگر صرفی زبان فارسی
In this paper, the theoretical foundation, the way of implementation and the uses of Pars Morph, a Persian morphological analyzer is introduced. Pars Morph is a rule-based Persian morphological analysis system, which analyzes the internal structure of word in Persian and determines the grammatical category and function of the word parts. Pars Morph being in link with a lexicon covering about 45...
متن کاملImprovements in RWTH LVCSR evaluation systems for Polish, Portuguese, English, urdu, and Arabic
In this work, Portuguese, Polish, English, Urdu, and Arabic automatic speech recognition evaluation systems developed by the RWTH Aachen University are presented. Our LVCSR systems focus on various domains like broadcast news, spontaneous speech, and podcasts. All these systems but Urdu are used for Euronews and Skynews evaluations as part of the EUBridge project. Our previously developed LVCSR...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010